\section*{Google PageRank}

Google has become the most used search engine in a very small amount of time
after its creation. This success depends on the accuracy of the results that
was highly better than the other search engines. Google generates its own
\emph{ranking} of the importance of the web pages in order to sort the results
of a query. This ranking is what the user is interested in.

The algorithm that sorts web pages based on their importance is called
\emph{PageRank} and it has been developed by S.~Brin and L.~Page at Stanford
University \cite{Brin1998}. The basic principle is as follows
\begin{itemize}

    \item if a web page A has a \emph{link} to a page B, this is interpreted as
    a vote by A in favour of B, so B's place in the ranking is raised;

    \item the voters are not all the same: the vote from someone who is high in
    the ranking (someone who has many votes) weights more than someone who is
    low-ranked.

\end{itemize}

\begin{figure}
    \centering
    \includegraphics[width=0.6\textwidth]{./fig/PageRanks_Example_only_letters}
    \caption{Schematic diagram of the \emph{ranking}: every sphere represents
    a web page, every arrow a link. The dimension of the sphere measures the
    importance of the page.}
    \label{fig:PR}
\end{figure}

This principle is depicted in Fig.~\ref{fig:PR}. Page B receives many links and
therefore it is considered very important. Page C is more important than page E,
even if it has less links, since a vote from B matters more than many small
links that E gets (i.e. if the \emph{Washington Post} or the
\emph{New York Times} would publish a link to the website for the PACS course,
this link would count more than the links coming from the personal page of the
professor, his assistant and the students of the course!).

In a more formal way, the \emph{ranking} $r(p)$ of the page $p$ is defined as
the sum of all the \emph{rankings} $r(q)$ of all the pages $q$ that link to $p$
\begin{align}
    \label{eq:rank}
    r(p)=\sum_{q \rightarrow p} \frac{r(q)}{\#q}
\end{align}
The sum is weighted with $\#q$, the number of links in the page $q$ (if the page
$q$ has only one link towards $p$ it is likely that an interested reader would
follow it, on the other hand, if the page $q$ has many links, it is less likely
that the reader would choose the link to $p$).
It is simple to notice that this is an implicit problem: in order to compute the
ranking of $p$ we need the ranking of the other pages, that are in turn based on
the ranking of $p$.

The problem is simplified by writing it in a matricial form. Let $\{p_1, p_2,
\dots, p_N\}$ be all the pages and $A \in \mathbb{R}^{N \times N}$ be the
connection matrix, with $a_{i,j}$ given by
\begin{align}
      \label{eq:harmonicp}
      a_{i,j}=
      \begin{cases}
        \dfrac{1}{\#p_j} & \text{if there is a link from $p_j$ to $p_i$} \\[1ex]
        0 &  \mbox{otherwise}
      \end{cases}
\end{align}
Note that the $a_{i,j}$ can be interpreted as a probability distribution, in
particular they decsribe the probability that a person who clicks randomly
walks from the page $p_j$ to the page $p_i$. The matrix $A$ has the following
property
\begin{align} \label{eq:sum1}
    \sum_{i=1}^{N} a_{i,j}=1
\end{align}

If the ranking of the pages $p_i$ is represented by the column vector $\vv{r} =
[r_1, r2, \dots, r_N]^T$, called \emph{PageRank}, \eqref{eq:rank} is equivalent
to
\begin{align*}
    \vv{r}=A \vv{r}
\end{align*}
Therefore, the \emph{PageRank} is the eigenvector that corresponds to the
eigenvalue $1$ of the associated eigenvalue problem. It can be demonstrated
that, if the $\lambda_i$ are the eigenvalues of $A$, then $|\lambda_i| \leq 1$.
Moreover, $\lambda_1=1$ has molteplicity one\footnote{A sufficient condition is
given by the Perron Frobenius theorem, that requires a matrix that corresponds
to an irriducible graph. A trick to satisfy the hypothesis could be to assign
a link to every existing page without links.}.

Since the number $N$ of analyzed pages is in the order of the tens of billions,
the computation of the \emph{PageRank} eigenvector with a direct method is
computationally expensive, even for the ones that have extraordinary
computational resources. We therefore use an iterative method, that gives an
approximate solution: the power method. In this particular case, stopping the
iterations at the step $k$ means that we consider only the pages $p_j$ that
are $k$ clicks away from $p_i$\footnote{For semplicity reasons we disregard the
possibility to introduce a \emph{damping fatctor} \cite{Brin1998}, that does
not modify the basic principle of the algorithm.}.
